Utilization of Suffix Array for Quick STD and Its Evaluation on the NTCIR-9 SpokenDoc Task

نویسندگان

  • Kouichi Katsurada
  • Koudai Katsuura
  • Yurie Iribe
  • Tsuneo Nitta
چکیده

We propose a technique for detecting keywords quickly from a very large speech database without using a large-sized memory. For acceleration of search and saving the use of memory, we employed a suffix array as a data structure and applied phonemebased DP-matching to it. To avoid exponential explosion of process time with the length of a keyword, a long keyword is divided into short sub-keywords. Moreover, iterative lengthening search algorithm is used for outputting the accurate search results fast.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Multiple Speech Recognition Results to Enhance STD with Suffix Array on the NTCIR-10 SpokenDoc-2 Task

We have previously proposed a fast spoken term detection method that uses a suffix array as a data structure. By applying dynamic time warping on a suffix array, we achieved very quick keyword detection from a very large-scale speech document. In this study, we modify our method so that it can deal with multiple recognition results. By using these results obtained from various speech recognizer...

متن کامل

Utilizing Confusion Network in the STD with Suffix Array and Its Evaluation on the NTCIR-11 SpokenQuery & Doc SQ-STD Task

The authors have proposed a fast spoken term detection that uses a suffix array as a data structure. This method enables very quick and memory saving search by using such techniques as keyword division, dynamic time warping, and employment of articulatoryfeature-based local distance definition. In this paper, we investigate a new approach that utilizes a confusion network in the suffix array. T...

متن کامل

Spoken Document Retrieval Experiments for SpokenDoc at Ryukoku University (RYSDT)

In this paper, we describe spoken document retrieval systems in Ryukoku University, which were participated in NTCIR-9 IR for Spoken Documents (“SpokenDoc”) task. In NTCIR-9 “SpokenDoc” task, there are two subtasks: “Spoken term detection (STD) subtask” and “Spoken document retrieval (SDR) subtask”. We participated in the both subtasks as team RYSDT. In this paper, first, our STD systems are de...

متن کامل

STD and SCR Techniques and Their Evaluations on the NTCIR-10 SpokenDoc-2 Task

This paper describes spoken term detection (STD) and spoken contents retrieval (SCR) techniques and their evaluations at the NTCIR-10 SpokenDoc-2 task. First of all, we describes our STD technique using a phoneme transition network (PTN) derived from multiple speech recognizers’ outputs and its evaluations at the STD and the iSTD (inexistent STD) tasks. Next, we introduce our SCR technique usin...

متن کامل

STD based on Hough Transform and SDR using STD results: Experiments at NTCIR-9 SpokenDoc

In this paper, we report our experiments at NTCIR-9 IR for Spoken Documents (SpokenDoc) task. We participated both the STD and SDR subtasks of SpokenDoc. For STD subtask, we applied novel indexing method, called metric subspace indexing, previously proposed by us. One of the distinctive advantages of the method was that it could output the detection results in increasing order of distance witho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011